Mixture modelling involves explaining some observed evidence using acombination of probability distributions. The crux of the problem is theinference of an optimal number of mixture components and their correspondingparameters. This paper discusses unsupervised learning of mixture models usingthe Bayesian Minimum Message Length (MML) criterion. To demonstrate theeffectiveness of search and inference of mixture parameters using the proposedapproach, we select two key probability distributions, each handlingfundamentally different types of data: the multivariate Gaussian distributionto address mixture modelling of data distributed in Euclidean space, and themultivariate von Mises-Fisher (vMF) distribution to address mixture modellingof directional data distributed on a unit hypersphere. The key contributions ofthis paper, in addition to the general search and inference methodology,include the derivation of MML expressions for encoding the data usingmultivariate Gaussian and von Mises-Fisher distributions, and the analyticalderivation of the MML estimates of the parameters of the two distributions. Ourapproach is tested on simulated and real world data sets. For instance, weinfer vMF mixtures that concisely explain experimentally determinedthree-dimensional protein conformations, providing an effective null modeldescription of protein structures that is central to many inference problems instructural bioinformatics. The experimental results demonstrate that theperformance of our proposed search and inference method along with the encodingschemes improve on the state of the art mixture modelling techniques.
展开▼